19 research outputs found
Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease
From a fresh data science perspective, this thesis discusses the prediction
of coronary artery disease based on genetic variations at the DNA base pair
level, called Single-Nucleotide Polymorphisms (SNPs), collected from the
Ontario Heart Genomics Study (OHGS).
First, the thesis explains two commonly used supervised learning algorithms,
the k-Nearest Neighbour (k-NN) and Random Forest classifiers, and includes a
complete proof that the k-NN classifier is universally consistent in any finite
dimensional normed vector space. Second, the thesis introduces two
dimensionality reduction steps, Random Projections, a known feature extraction
technique based on the Johnson-Lindenstrauss lemma, and a new method termed
Mass Transportation Distance (MTD) Feature Selection for discrete domains.
Then, this thesis compares the performance of Random Projections with the k-NN
classifier against MTD Feature Selection and Random Forest, for predicting
artery disease based on accuracy, the F-Measure, and area under the Receiver
Operating Characteristic (ROC) curve.
The comparative results demonstrate that MTD Feature Selection with Random
Forest is vastly superior to Random Projections and k-NN. The Random Forest
classifier is able to obtain an accuracy of 0.6660 and an area under the ROC
curve of 0.8562 on the OHGS genetic dataset, when 3335 SNPs are selected by MTD
Feature Selection for classification. This area is considerably better than the
previous high score of 0.608 obtained by Davies et al. in 2010 on the same
dataset.Comment: This is a Master of Science in Mathematics thesis under the
supervision of Dr. Vladimir Pestov and Dr. George Wells submitted on January
31, 2014 at the University of Ottawa; 102 pages and 15 figure
Swarm Differential Privacy for Purpose Driven Data-Information-Knowledge-Wisdom Architecture
Privacy protection has recently been in the spotlight of attention to both
academia and industry. Society protects individual data privacy through complex
legal frameworks. The increasing number of applications of data science and
artificial intelligence has resulted in a higher demand for the ubiquitous
application of the data. The privacy protection of the broad
Data-Information-Knowledge-Wisdom (DIKW) landscape, the next generation of
information organization, has taken a secondary role. In this paper, we will
explore DIKW architecture through the applications of the popular swarm
intelligence and differential privacy. As differential privacy proved to be an
effective data privacy approach, we will look at it from a DIKW domain
perspective. Swarm Intelligence can effectively optimize and reduce the number
of items in DIKW used in differential privacy, thus accelerating both the
effectiveness and the efficiency of differential privacy for crossing multiple
modals of conceptual DIKW. The proposed approach is demonstrated through the
application of personalized data that is based on the open-sourse IRIS dataset.
This experiment demonstrates the efficiency of Swarm Intelligence in reducing
computing complexity
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
We present Unicoder, a universal language encoder that is insensitive to
different languages. Given an arbitrary NLP task, a model can be trained with
Unicoder using training data in one language and directly applied to inputs of
the same task in other languages. Comparing to similar efforts such as
Multilingual BERT and XLM, three new cross-lingual pre-training tasks are
proposed, including cross-lingual word recovery, cross-lingual paraphrase
classification and cross-lingual masked language model. These tasks help
Unicoder learn the mappings among different languages from more perspectives.
We also find that doing fine-tuning on multiple languages together can bring
further improvement. Experiments are performed on two tasks: cross-lingual
natural language inference (XNLI) and cross-lingual question answering (XQA),
where XLM is our baseline. On XNLI, 1.8% averaged accuracy improvement (on 15
languages) is obtained. On XQA, which is a new cross-lingual dataset built by
us, 5.5% averaged accuracy improvement (on French and German) is obtained.Comment: Accepted to EMNLP2019; 10 pages, 2 figure
Rehabilitation treatment of multiple sclerosis
Multiple sclerosis is a slowly progressive disease, immunosuppressants and other drugs can delay the progression and progression of the disease, but the most patients will be left with varying degrees of neurological deficit symptoms, such as muscle weakness, muscle spasm, ataxia, sensory impairment, dysphagia, cognitive dysfunction, psychological disorders, etc. From the early stage of the disease to the stage of disease progression, professional rehabilitation treatment can reduce the functional dysfunction of multiple sclerosis patients, improve neurological function, and reduce family and social burdens. With the development of various new rehabilitation technologies such as transcranial magnetic stimulation, virtual reality technology, robot-assisted gait, telerehabilitation and transcranial direct current stimulation, the advantages of rehabilitation therapy in multiple sclerosis treatment have been further established, and more treatment means have also been provided for patients
CRAFTS for Fast Radio Bursts : extending the dispersion-fluence relation with new FRBs detected by FAST
We report three new FRBs discovered by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), namely FRB 181017.J0036+11, FRB 181118, and FRB 181130, through the Commensal Radio Astronomy FAST Survey (CRAFTS). Together with FRB 181123, which was reported earlier, all four FAST-discovered FRBs share the same characteristics of low fluence (1000 pc cm(-3)), consistent with the anticorrelation between DM and fluence of the entire FRB population. FRB 181118 and FRB 181130 exhibit band-limited features. FRB 181130 is prominently scattered (tau(s) 8 ms) at 1.25 GHz. FRB 181017.J0036+11 has full-bandwidth emission with a fluence of 0.042 Jy ms, which is one of the faintest FRB sources detected so far. CRAFTS has started to build a new sample of FRBs that fills the region for more distant and fainter FRBs in the fluence-DME diagram, previously out of reach of other surveys. The implied all-sky event rate of FRBs is 1.24(-0.90)(+1.94) x 5 sky(-1) day(-1) at the 95% confidence interval above 0.0146 Jy ms. We also demonstrate here that the probability density function of CRAFTS FRB detections is sensitive to the assumed intrinsic FRB luminosity function and cosmological evolution, which may be further constrained with more discoveries
CRAFTS for Fast Radio Bursts Extending the dispersion-fluence relation with new FRBs detected by FAST
We report three new FRBs discovered by the Five-hundred-meter Aperture
Spherical radio Telescope (FAST), namely FRB 181017.J0036+11, FRB 181118 and
FRB 181130, through the Commensal Radio Astronomy FAST Survey (CRAFTS).
Together with FRB 181123 that was reported earlier, all four FAST-discovered
FRBs share the same characteristics of low fluence (0.2 Jy ms) and high
dispersion measure (DM, \dmu), consistent with the anti-correlation
between DM and fluence of the entire FRB population. FRB 181118 and FRB 181130
exhibit band-limited features. FRB 181130 is prominently scattered
( ms) at 1.25 GHz. FRB 181017.J0036+11 has full-bandwidth
emission with a fluence of 0.042 Jy ms, which is one of the faintest FRB
sources detected so far. CRAFTS starts to built a new sample of FRBs that fills
the region for more distant and fainter FRBs in the fluence- diagram,
previously out of reach of other surveys. The implied all sky event rate of
FRBs is sky day at the
confidence interval above 0.0146 Jy ms. We also demonstrate here that the
probability density function of CRAFTS FRB detections is sensitive to the
assumed intrinsic FRB luminosity function and cosmological evolution, which may
be further constrained with more discoveries.Comment: 9 Pages, 4 Plots and 1 Table. The Astrophysical Journal Letter
Accepte